首页> 外文OA文献 >Workload Scheduling in Distributed Stream Processors using Graph Partitioning
【2h】

Workload Scheduling in Distributed Stream Processors using Graph Partitioning

机译:使用图分区的分布式流处理器中的工作负载调度

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

With ever increasing data volumes, large compute clusters that process data in a distributed manner have become prevalent in industry. For distributed stream processing platforms (such as Storm) the question of how to distribute workload to available machines, has important implications for the overall performance of the system.\udWe present a workload scheduling strategy that is based on a graph partitioning algorithm. The scheduler is application agnostic: it collects the communication behavior of running applications and creates the schedules by partitioning the result- ing communication graph using the METIS graph partitioning software. As we build upon graph partitioning algorithms that have been shown to scale to very large graphs, our approach can cope with topologies with millions of tasks. While the experiments in this paper assume static data loads, our approach could also be used in a dynamic setting.\udWe implemented our proposed algorithm for the Storm stream processing system and evaluated it on a commodity cluster with up to 80 machines. The evaluation was conducted on four different use cases – three using synthetic data loads and one application that processes real data.\udWe compared our algorithm against two state-of-the-art sched- uler implementations and show that our approach offers sig- nificant improvements in terms of resource utilization, enabling higher throughput at reduced network loads. We show that these improvements can be achieved while maintaining a balanced workload in terms of CPU usage and bandwidth consumption across the cluster. We also found that the performance advantage increases with message size, providing an important insight for stream-processing approaches based on micro-batching.
机译:随着数据量的不断增长,以分布式方式处理数据的大型计算集群已在业界变得越来越普遍。对于分布式流处理平台(例如Storm),如何将工作负载分配到可用机器的问题对于系统的整体性能具有重要意义。\ ud我们提出了一种基于图分区算法的工作负载调度策略。调度程序与应用程序无关:它收集正在运行的应用程序的通信行为,并通过使用METIS图分区软件对结果通信图进行分区来创建调度。当我们基于已显示为可缩放到非常大的图的图分区算法建立时,我们的方法可以应对具有数百万个任务的拓扑。虽然本文中的实验假设静态数据负载,但我们的方法也可以用于动态设置。\ ud我们为Storm流处理系统实现了我们提出的算法,并在最多80台机器的商品集群上对其进行了评估。评估是在四种不同的用例上进行的-其中三种使用合成数据负载和一个处理实际数据的应用程序。\ ud我们将我们的算法与两种最新的调度程序实现进行了比较,并表明我们的方法可以提供显着的效果资源利用率方面的改进,可以在减少网络负载的情况下提高吞吐量。我们证明,在集群中CPU使用率和带宽消耗方面保持平衡的工作负载的同时,可以实现这些改进。我们还发现,性能优势随消息大小而增加,这为基于微批处理的流处理方法提供了重要的见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号